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Abstract 

Upcoming HI surveys will deliver large datasets, and automated processing using the full 3-D information (two 
positional dimensions and one spectral dimension) to find and characterize HI objects is imperative. In this context, 
visualization is an essential tool for enabling qualitative and quantitative human control on an automated source finding 
and analysis pipeline. We discuss how Visual Analytics, the combination of automated data processing and human 
reasoning, creativity and intuition, supported by interactive visualization, enables flexible and fast interaction with 
the 3-D data, helping the astronomer to deal with the analysis of complex sources. 3-D visualization, coupled to 
modeling, provides additional capabilities helping the discovery and analysis of subtle structures in the 3-D domain. 
The requirements for a fully interactive visualization tool are: coupled 1-D/2-D/3-D visualization, quantitative and 
comparative capabilities, combined with supervised semi-automated analysis. Moreover, the source code must have the 
following characteristics for enabling collaborative work: open, modular, well documented, and well maintained. We 
review four state of-the-art, 3-D visualization packages assessing their capabilities and feasibility for use in the case of 
3-D astronomical data. 
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1. Introduction 


The Square Kilometre Array (SKA) and its precursors 
are opening up new opportunities for radio astronomy in 
terms of data collection and sensitivity. Two types of blind 
surveys are planned with SKA-pathfinders: 

1. shallow (very large sky coverage): WALLABY with 
ASKAP ( [Johnston et al. 2008| [Puffy et ah 20121, 
shallow and medium-deep APERTIF surveys with 


the WSRT (Verheijen et al. 2009). 


2 . 


deep (high sensitivity, small solid angle): CHI-LES 
with the J-VLA (Perley et al., 201 1[ Fernandez et al. 


2013); LADUMA with MeerKAT (Jonas, 2009 Hof 


ston et al. 2008 


werda et al. 2012) and DIN-GO with ASKAP (John 


Duffy et al. 2012). 


The first type of HI surveys will detect ^ 10^ sources 
weekly, of which 0.2% will consist of well resolved sources, 
6.5% will have a limited number of resolution elements, 
and 93% will at best be marginally resolved ( Duffy et al.[ 
2012). This predicted weekly data rate is high, and fully 


automated pipelines will be required for processing the 
data (see section [^. The first and second category of 
sources will contain a wealth of morphological and kine¬ 
matic information. However, in cases with complex kine¬ 
matics it will be difficult to extract all information in 
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a controlled and quantitative way (Sancisi et al. 2008 


Boomsma et al., 2008). Therefore, manual analysis of a 
subset of the resolved sources will still be required. In 
fact, manual processing will be very useful for obtaining a 
deeper insight in particular features of the data (e.g., tails 
and extra-planar-gas; see section 3.3). It will also enhance 
possible improvements to the automated pipelines. For 
example, it can play a major role in the development and 
training of machine learning algorithms for the automated 
analysis data, in particular in the era of the full SKA data 
(see section 


2 . 6 ). 


The SKA pathfinders will provide a wealth of data, but 
the expected exponential growth of the data has created 
several data challenges. We will present a preview of the 
data that APERTIF will deliver to the community in the 
near future and discuss the importance of visualization for 
the analysis of radio data in the upcoming surveys era. 
Our discussion will be based on existing mosaics acquired 
with the Westerbork Synthesis Radio Telescope (WSRT), 
which are representative for the daily image data rate pro¬ 
vided by future blind HI surveys. 


1.1. WSRT and the APERTIF data 

The WSRT consists of a linear array of 14 antennas 
with a diameter of 25 meters arranged on a 2.7 km East- 
West line located in the north of the Netherlands. The 
APERTIF phased array feed system is an upgrade to the 
WSRT which will increase the field of view by a factor of 30 
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(Verheijen et al. 2008 Oosterloo et al. 20101, which allows 
a full inventory of the northern radio sky complemented by 
a wealth of optical, near-IR data, and other radio observa¬ 
tories such as the Low-Frequency Array (LOFAR). Part of 
the APERTIF surveys will be a medium deep blind survey 
of a few hundred square degrees with a Scr column density 
depth of 2 — 5 X 10^® cm“^. 

A full 12 hour integration will provide ~ 2.4 TB of com¬ 
plex visibilities sampling a 3° x 3° region of the sky and the 
following data reduction will generate three dimensional 
data sets of the HI line emission, with axes right ascen¬ 
sion (RA or a), declination (DEC or <5), and frequency (A) 
or recession velocity (v). The typical size of a data cube 
will be 2048 x 2048 pixels for the spatial coordinates (each 
pixel covers 5 arcsec) and 16384 spectral channels, which 
correspond to 16384 pixels in the third dimension covering 
a bandwidth of 300 MHz (~ 60, 000 km/s). The disk stor¬ 
age needed for each data cube is about 0.25 TB, assuming 
a single Stokes component, I, and 32 bits per pixel format. 
The final product after observing the northern sky will be 
of the order of 5 PB of data cubes. 

Examining these numbers it is clear that the storage, 
data reduction, visualization and analysis to obtain sci¬ 
entific results requires the development of new tools and 
algorithms which must exploit new solutions and ideas to 
deal with this large volume of data. The Tera-scale volume 
of these datasets produces, in fact, both technical issues 
(e.g., dimension of the data much larger than the avail¬ 
able random access memory (RAM) on a normal work¬ 
station) and visualization challenges (i.e., the presence in 
each dataset of a large number of small sources with lim¬ 
ited signal-to-noise ratio (SNR)). 


1.2. Data visualization 

Traditionally visualization in radio astronomy has been 
used for: 

i) finding artefacts due to an imperfect reduction of the 
data; 

ii) finding sources and qualitatively inspect them; 

iii) performing quantitative and comparative analysis of 
the sources. 

In this paper we will focus mainly on the connection be¬ 
tween interactive visualization and the automated source 
finder and analysis pipeline ([^ ; and the importance of in¬ 
teractive, quantitative and comparative visualization ( pri] ). 
We will not discuss visualization of artifacts Q resulting 
from imperfections in the data. Artifacts can arise from 
several effects: Radio Frequency Interference (RFI), errors 
in the bandpass calibration, or errors in the continuum 
subtraction. Volume rendering can help localizing such ar¬ 
tifacts, but in that case visualization is envisaged to play 
the role of assisting quality control of the products of an 
automated calibration pipeline. This will be the subject 
of a separate study as it may require different tools. 

In section we give an overview of the past and cur¬ 
rent visualization packages and algorithms, with a focus 


on radio astronomy. We highlight the 3-D nature of the 
HI data in section The definition of the requirements 
for a fully interactive visualization tool is given in section 
Finally, in section we review state of-the-art visu¬ 
alization packages with 3-D capabilities. Our aim is to 
define the basis for the development of a 3-D interactive 
visualization tool. 


2. Scientific visualization 


Scientific visualization is the process of turning numer¬ 
ical scientific data into a visual representation that can be 
inspected by eye. The concept of scientific visualization 


born in the 80’s ( 

McCormick et al. 

1987 

Frenkel 

1988 

DeFanti et al. 

1989 

|. Its role was not relegated to only 

presentation ( 

Roerdink 

2013 

). The interactive processing 


of the data, the imaging and analysis, including qualita¬ 
tive, quantitative and comparative stages, is crucial for 
archiving a deep and complete knowledge. 


In this section we provide background information about 
past visualization developments in astronomy, scientific vi¬ 
sualization theory, visualization hardware and the software 
terminology used in this paper. 


2.1. Visualization in astronomy 

One of the first systematic radio astronomy visual- 


ization trials was undertaken by 

Ekers and Allen 

(1975 

(see also 

Allen 

1979 

Sedmak et al. 

1979 

Allen 

1985) 


They investigated techniques for displaying single-image 
data sets, including contour display, ruled surface display, 
grey scale display, and pseudo-color display. They also dis¬ 
cussed techniques for the display of multiple image data 
sets, including false-color display and cinematographic dis¬ 
play. 


At the beginning of the 90’s, 

Mickus et al. 

(1990), 

Domik 

(1992 

1, 

Mickus et al. (; 

990 

b 

Domik and Mickus- 

Miceli 

1992), and Brugel et al. 

(1993 

) developed a visual- 


ization tool named the Scientific Toolkit for Astrophysical 
Research (STAR). STAR was a prototype resulting from 
the development of an user interface and the implementa¬ 
tion of visualization techniques suited to the needs of as¬ 
tronomers at that time. These included display of one- and 
two-dimensional datasets, perspective projection, pseudo¬ 
coloring, interactive color coding techniques, volumetric 
data displays, and data slicing. 


Recently, both Hassan and Fluke (20111 and Koribal- 


ski (2012) pointed out the lack of a tool that can deal with 


large astronomical data cubes. In fact, the current astron¬ 
omy software packages are characterized by a window in¬ 
terface for 2-D visualization of slices through the 3-D data 
cube; in some cases limited 3-D rendering is also present. 
Moreover, they can exploit only the resources of a personal 
computer which imposes strong limitations on the avail¬ 
able RAM and processing power. Stand-alone visualiza¬ 
tion tool examples are KARMA (Gooch 19961, SAD Image DS9 


(Joye 2006), VisIVD (Comparato et al. 2007 Becciani 
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et al., 2010) and S2PL0T (Barnes and Fluke, 2008). Other 


viewers exist but are embedded in reduction and analysis 


packages: GIPSY ( 

van der Hulst et al. 

1992 Vogelaar and 

Terlouw 

2001 

), CASA ( 

McMullin et al. 

2007 

) and AIPS 


(Greisen 2003). A recent development is the use of the 
open source software Blender for visualization of astro¬ 
nomical data (Kent 2013 Taylor et al. 2014), but this 
application is more suitable for data presentation rather 
than interactive data analysis. 

From the inventory of the current state of-the-art we 
conclude that the expected exponential growth of radio 
astronomy data both in resolution and field of view has 
created a necessity for new visualization tools. In the 
meantime much development has taken place in computer 
science and medical visualization. We review relevant soft¬ 
ware from these areas in sections [3] and [Sl 


2.2. 3-D visualization 

First investigations of the suitability of 3-D visualiza¬ 
tion for radio-astronomical viewers date back to the begin¬ 


ning of the 90’s (Norris 1994). Already at that time it was 


clear that a 3-D approach can provide a better understand¬ 
ing of the 3-D domain of the radio data. The type of data 
slicing commonly used (i.e., channel movies), forces the 
researcher to remember what was seen in other channels 
and requires a mental reconstruction of the data struc¬ 
ture. The major advantage of a 3-D technique is an easier 
visual identification all structure, including faint features 
extending over multiple channels. A crucial point made 
by Norris is that presenting the results qualitatively is fine 
for data inspection, but that interactive and quantitative 
hypothesis testing requires quantitative visualization. 

In the last twenty years hardly any new 3-D visualiza¬ 
tion tools were developed for examining 3-D radio astro¬ 


nomical data. In the middle of the 90’s, Oosterloo (1995) 


investigated porting direct volume rendering techniques 
to radio astronomy visualization. He analyzed the fea¬ 
tures and the issues related to a ray casting algorithm (a 
massively parallel image-order method, see Roth (1982)), 
pointing out, in general, the advantages and drawbacks of 
the 3-D visualization. He could, however, not develop a 
run-time 3-D interactive software package due to the lack 
of available computational resources. 


2.3. Volume rendering 

3-D visualization is the process of creating a 2-D pro¬ 
jection on the screen of the 3-D objects under study. This 
process is called volume rendering. The rendering meth¬ 
ods are divided in two principal families: indirect vol¬ 
ume rendering (or surface rendering) and direct volume 
rendering. The first approach fits geometric primitives 
through the data and then it renders the image. It re¬ 
quires a pre-processing step on the dataset, then after the 
pre-processing a quick rendering is possible. Fitting geo¬ 
metric primitives, however, may introduce noise errors due 
to rendering artifacts. Moreover, not all datasets can be 


easily approximated with geometric primitives and the HI 
sources fall into this class because they do not have well 
defined boundaries. Furthermore, in a HI data cube the 
signal-to-noise is usually low. For example, the galaxies 


in the WHISP survey (van der Hulst et al. 2001), have 


average signal-to-noise of ~ 10 in the inner parts and ^ 
1 in the outer parts. This makes indirect volume render¬ 
ing inefficient. Direct volume rendering methods directly 
render the data defined on a 3-D grid (each element of 
the grid is called a voxel), therefore it requires more com¬ 
putations to process an image. Several direct rendering 
solutions exist and they are classified as: 1) Object order 
methods, requiring an iteration over the voxels which are 
projected on the image plane; 2) Image order methods, 
which instead iterate over the pixels of the final rendered 
image and have the algorithm calculate how each voxel in¬ 
fluences the color of a single pixel. 3) Hybrid methods, a 
combination of the first two. It must be noted that during 
the process of direct volume rendering the depth informa¬ 
tion can be mixed depending on the projection method 
used (i.e., maximum, minimum, and accumulate). By ro¬ 
tating or the use of 3-D hardware the human user is able 
to mentally connect the various frames and to register the 
proper 3-D structures. For a detailed review of the state 
of-the-art and for more information we refer to the Visu- 


alization Handbook ( 

Hansen and Johnson, 2004) and the 

VTK book ( 

Schroeder et al. 

2006 

4th edition). 


2 . 4 . Out-of-core and in-core solutions 

The rendering software can exploit an out-of-core or 
an in-core solution. Out-of-core solutions are optimized 
algorithms designed to handle datasets larger than the 
main system memory by utilizing secondary, but much 
slower, storage devices (e.g., hard disk) as an auxiliary 
memory layer. These algorithms are optimized to effi¬ 
ciently fetch or pre-fetch data from such secondary storage 
devices to achieve real-time performance. They usually 
utilize a multi-resolution data representation to facilitate 
such a fast fetch mechanism and to support different avail¬ 
able output resolutions based on the limitations in terms 
of the processing time and the available computational 


2009). 


resources (Rusinkiewicz and Levoy 2000 Crassin et al. 


The in-core solution can achieve very fast memory trans¬ 
fer because it does not need to access the data stored on a 
hard disk continuously. In fact, it assumes that the data 
are stored in the main system memory, ready for process¬ 
ing. Of course, in this case the main limitation is the size 
of the available RAM. 


2.5. 3-D hardware 

The use of 2-D input and output hardware limits the 
possible interaction with a 3-D representation. 3-D input 
devices (such as 3-D mouse or pointer) can naturally solve 
this problem. Moreover, coupling them to a 3-D output 
device such as a 3-D monitor, a CAVE virtual environ¬ 
ment, etc., can remove the difficulty of positioning a 3-D 
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cursor in a 3-D space. In fact, in this case, the user can 
see the real 3-D movement, instead of the projection on a 
2-D screen. However, virtual reality has never been widely 
used in the researchers’ daily work due to the dependence 
on very expensive hardware not available on the common 
computer market. 

Recently, two new very promising devices, the Leap 
Motion (an input device that tracks the hands in 3-D 
and the Oculus Rift (a 3-D output device with a full im¬ 
mersion virtual reality experience [^, appeared that can 
change this situation, because they are aimed for the gam¬ 
ing market, and therefore will be rather cheap. 

This hardware could enhance new interaction perspec¬ 
tives with volumetric data using a desktop solution. We 
will however exclude them from our visualization discus¬ 
sion because the success and therefore the maintainabil¬ 
ity of a visualization solution based on them, which de¬ 
pends on the gaming market, is still uncertain. Moreover, 
from the point of view of interface design, the use of this 
new hardware creates the need to develop new interface 
concepts. The equivalent expertise that exists for clas¬ 
sical interfaces such as mouse and keyboard is, however, 
still missing. This does not exclude that in the coming 
years virtual reality may become very popular and stimu¬ 
late many developers to experiment with the Leap Motion 
and the Oculus Rift or future 3-D hardware. 


2.6. Visual Analytics 

In the SKA era manual inspection and analysis of even 
only a subset of data will be extremely hard to achieve. 
Machine learning will be needed for classification of the dif¬ 


ferent components of a galaxy (de la Calleja and Fuentes 


2004 Banerji et ah, 2010). However, the reliability of the 


analysis by machine learning heavily depends on the input 
for the training session (Kuminski et al. 2014). Discover¬ 
ing interesting relations, structures, and patterns in very 
large and high-dimensional data spaces needs the combina¬ 
tion of automated data processing with human reasoning, 
creativity and intuition, supported by interactive visual¬ 
ization. Human assessment remains essential for under¬ 
standing the behavior of automatic algorithms and for vi¬ 
sual quality control. As the available data grow, effective 
and efficient techniques are essential to increase our insight 
in the underlying structures and processes. 

Combining interactive visualization with analytic tech¬ 
niques (machine learning, statistics, data mining) has grown 
into a field of its own: Visual Analytics (Thomas and 
Cook[ [^05[ Keim et al., [2010 ). It aims to fully integrate 


human expertise in the human-machine dialogue to steer 
the sense-making process. Visual analytics supports col¬ 
laborative exploration and decision making by combining 
fast access to large distributed databases, data integra¬ 
tion, powerful computing infrastructures, and interactive 


^https://www.leapmotion.com/ 
"^http: / /www. oculus . com/ 


visualization facilities {e.g., large touch displays). Astron¬ 
omy is an exciting and extremely demanding testfield for 
new visual analytics techniques. Data availability, stor¬ 
age and distribution are well covered. Expert knowledge 
is available to validate algorithmic approaches. Data-set 
dimensionality (dimension d = 10 ... 100) and sizes (> 10® 
elements) make scalability extremely difficult to achieve. 
Extracting meaningful relations across the entire set of 
data dimensions is inherently hard for data of high dimen¬ 
sionality Bertini (2011). Integrating data sources, data- 
reduction algorithms, and expert knowledge to effectively 
and efficiently answer domain-specific questions is an open 
challenge. Visual analytics advocates a mixed approach: 
automatically search datasets for potentially meaningful 
patterns, and interactively steer data reduction and visu¬ 
alization. 


3. Visualization of HI data sets 

The domain of future radio surveys, such as those plan¬ 
ned with APERTIF, will fall in the Big Data domain for 
two reasons: 

i) a data cube will have dimensions of 2048 x 2048 x 
16384 ^ 68.7 x 10® (0.25 TB). The data rate is ^ 10 
cubes/week; 

ii) each data cube will contain ~ 100 sources, i.e. galax¬ 
ies, of relatively small typical size (~ 10® voxels) in 
the observed data volume of ~ 10^^ voxels. 

A very important step is to condense this vast amount 
of data collected by the surveys into a much smaller cat¬ 
alog of interesting regions, the sources, and their proper¬ 
ties. This is done by examining the data itself. If done 
manually, the astronomers have to explore the whole data 
set using visualization software in order to identify the 
sources. 


3.1. Visualization and source finding 

For illustrative purposes, we consider a mosaicked data 
cube that serves as a pilot training set for future, sin- 

in prep.). 


gle Apertif pointings (Ramatsoku et al. 2015 


The mosaic is built from 35 individual WSRT pointings 
in a hexagonal grid, directed towards a region in the sky 
where a filament of the Perseus-Pisces Supercluster (PP- 
Scl) crosses the plane of the Milky Way. The data cube 
covers a sky area of 2.4° x 2.4° centered at a = 72° and 
6 = 45°. The redshift range is cz = 2000 - 17000 km/s. 
The resulting data cube has dimension 2300 x 2300 pixels 
for the spatial coordinates and 1717 pixels in the velocity 
dimension. This is ~ 10 times smaller in the velocity (fre¬ 
quency) dimension than a single APERTIF pointing, but 
the spatial resolution, velocity resolution and sensitivity 
are comparable. The number of objects is also compara¬ 
ble as Perseus-Pisces is an over-dense region. The ~ 200 
sources comprise ^ 1% of the data volume. The minimum 
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column density detected is ~ 6.4 x 10^® cm“^ at the 3a 
level over a velocity range of 16.5 km/s. 

The three-dimensional visual representation of the mo¬ 
saic in Fig[^immediately highlights the sources’ shape and 
position in the data cube. Moreover, interactivity such as 
rotation, zooming and panning, and editable color trans¬ 
fer functions greatly support manual identification of the 
sources in the data cube. 

An interactive in-core ray casting algorithm running 
on a cluster of Graphic Processing Units (GPUs) has been 


proposed by Hassan et al. (2013) for the visualization of 
Tera-scale radio astronomy data cubes. In general, many 
large visualization software tools are in development in the 
context of computer science and medical imaging. Some 
notable examples follow: 

i) in-core solutions exploiting parallel computing on a 


cluster: ) 

VIorelanda et al. 

|Vo et al. 

(2011 

1; 


ii) out-of-core solutions: 

Grassin et al. 

(2009 

) (i.e., Giga- 

Voxels) and 

Hadwiger et al. 

(2012 

. 



In the case of visualization of HI in galaxies, it is, how¬ 
ever, unlikely that visualization of the full data cube will 

be used for finding sources for the following reasons: 

1) the size of the HI blind survey data volume and the 
number of sources, as illustrated in Figj^ prohibit a 
manual approach even when using very powerful inter¬ 
active visualization tools; 

2) radio data are intrinsically noisy, and most sources are 
faint and often extended. Spatial and/or spectral smooth¬ 
ing increase the signal-to-noise ratio depending on the 
source structure. In fact, smoothing is applied on mul¬ 
tiple spatial and spectral scales to ensure that sources 
of different size are extracted at their maximum, inte¬ 
grated signal-to-noise ratio. In Figj^ two visual repre¬ 
sentations of the PPScl data, respectively before and 
after the source finder step, are shown. In both cases it 
is possible to see a large number of sources, but many 
of them are hardly visible in the original data cube be¬ 
cause they drown in the noise. Manual operations such 
as zooming, changing the color function, and smooth¬ 
ing help the observer in identifying the sources visually. 
This will, however, take a prohibitive amount of time 
if done manually and will be impossible to perform if 
such data cubes are delivered at a rate of 1-2 per day; 

3) interactive rendering of ~ 10^^ voxels using an in-core 
solution, such as Hassan et al.| (20131 demonstrated, re¬ 
quires considerable resources for hardware and mainte¬ 
nance, not affordable by typical research groups or ma¬ 
jor observatories. An out-of-core solution can reduce 
the financial demands on hardware. However, the de¬ 
velopment itself of such a solution requires a huge pro¬ 
gramming effort due to many challenges related to the 
I/O bandwidth limits. We refer to Crassin et al. (2009) 
and Hadwiger et al. (2012) for a detailed description 


including CPU-GPU memory transfer solutions. Note, 
however, that none of the rendering pipelines cited here 
are publicly available yet. 

Automated pipelines have been developed to extract 


the source information from the data collected (Whiting 


2012 Serra et al. 2015). Their goal is to find all reliably 


detectable extragalactic HI objects in the observed data 
volume, and to determine the properties of these objects, 
that is: 


aj 

b) 


the galaxies, i.e., the regularly rotating gas disks; 
additional HI structures such as extra-planar-gas and 
tails. These are crucial for understanding the detailed 
balance between gas accretion and gas depletion pro¬ 
cesses, as well as their dependence on the environment, 
and for obtaining the full picture of galaxy evolution. 
For example, extra-planar-gas data can be used to quan¬ 
titatively constrain the gas accretion and depletion pro¬ 


cesses (see section 3.3.2). Another example is the pres¬ 
ence of tails in the data. Tails can be produced by 
tidal interactions between galaxies (Fig. or by ram 


pressure stripping (Oosterloo and van Gorkom 2005), 


and are strong indications for these processes. Deciding 
which process is important requires detailed inspection 
and modeling of the features discovered in the data. We 


refer to Sancisi et al. (2008) for a full review of the state 


of-the-art of HI observations and their interpretation. 
These features are located in the vicinity of the galaxies 
and have low column densities and low signal-to-noise; 
c) the faint HI in the cosmic web such as HI filaments 
between galaxies. This emission is expected to have 
very low column density and very low signal-to-noise 
in a single resolution element, so will be difficult to 
detect. It is probably extended, following the large- 
scale structure, so the signal-to-noise could be increased 
by smoothing. This is, however, unlikely to be sufficient 
for detection (see below). 

For inspecting 0 and 0 , visualization techniques can 
be used in the following approach: high-dimensional visu¬ 
alization (e.g., 3-D scatter plots) of the parameters pro¬ 
vided by the pipeline and stored in catalogs (such as posi¬ 
tion, flux, flux error, degree of asymmetry, velocity width, 
integrated profile asymmetry, etc.) gives an overview of 
the data and their 3-D domain (see section 4.4). Then, 
manual inspection will be performed for only a subset of 
sources, which can be delivered to a visualization analysis 
package with full rendering capabilities for further analysis 
(see section 3.3). In the case of ([^ we should point out 
that future observations with the SKA precursors, such 
as APERTIF and ASKAP, will not achieve the sensitivity 
to detect the cosmic web. The neutral fraction of cosmic 
web filaments is expected to be very low, leading to H I 
column densities 5, 10^® cm“^ 


Braun and Thilker 


(2004); 


Ribaudo et al. (2011). We therefore do not focus on such 


low level and extended emissions. 


of state of-the-art out-of-core visualization algorithms. 
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Figure 1: Two representations of the Hi in galaxies in a filament of the Perseus-Pisces Supercluster (PPScl) are shown. In the top-left panel, 
the rendering of the full data cube with a maximum intensity projection method is illustrated. In the top-right panel, the data cube after a 
semi-automated procedure, performing, with GIPSY routines, the smooth and clip procedure as implemented in |Serra et al.| ( [2015[ l, is shown. 
In both cases it is possible to see a large number of sources, but many of them are hardly visible in the top-left panel. In the bottom panels, 
two zooms are provided. Smoothing has been applied at the bottom-left sub-cube revealing some of the sources (circled) easily visible in the 
bottom-right panel. 
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Figure 2: Three views of the volume rendering of a particular source 
in the PPScl filament are shown. The op t ical co unterpart, WEIN069, 
has been observed by [Weinberger et ah] | |1995| l . The size of the data 
cube containing the source is 73'^ ~ 4 X 10“ voxels. In the upper 
panel we look along the frequency axis; in the middle panel along 
the RA axis; and in the bottom the view is parallel to the geometrical 
major axis of the galaxy. For computing the projection we used an 
accumulate method. The different colors highlight different intensity 
levels in the data. 


3.3. Visualization and source analysis 

In this paragraph we will show in detail, using visual¬ 
ization examples, the character of the 21-cm radio emission 
of galaxies and the benefits and drawbacks of adopting 3- 
D visualization, as pointed out already by Norris (1994) 
and Oosterloo (1995) (see also Goodman et al. (20091). 

The use of 3-D visualization of HI in galaxies is still in 
its infancy. Existing astronomical 3-D visualization tools 
lack interactivity and the ability to perform quantitative 
analysis. The lack of interactivity is mainly a result of 
the lack of computing power to date, as volume rendering 
is computationally expensive. Moreover, the use of 2-D 


3.2. Automated pipelines and human intervention. 

Automated pipelines will be responsible for finding the 
sources, measuring parameters that give an indication of 
the properties of a source and creating catalogs. Source 
finders are designed to automatically detect all the sources 
in the field. In order to do that, source finders must em¬ 
ploy an efficient mechanism to discriminate between such 
interesting regions and the noise. The peak flux, total 
flux, and number of voxels are parameters that can be 
used to determine the completeness and reliability of de¬ 
tected sources when examining both positive and negative 


detections ( 

Serra et al. 

2012 

). Due to the complex 3-D 

nature of the sources ( 

Sancisi et al. 

2008 

) and the noisy 


character of the data, it is, however, not trivial to con¬ 
struct a fully automated and reliable pipeline. A review 
of the current state-of-the-art is given by [Popping et al.j 
(2012), who describe the issues connected with the noisy 
nature of the data, and the various methods and their ef¬ 
ficiency. In addition, automated source characterization 
and measurement of source parameters are required for 
producing catalogs with science-ready products. Human 
inspection will be necessary for quality control of the re¬ 
sults from the pipelines and in particular for investigating 
complex cases. The human mind, in fact, is a very pow¬ 
erful diagnostic instrument which can naturally recognize 
(source) structures in the data. For example, in a signifi¬ 
cant number of cases, it will be very difficult to automat¬ 
ically retrieve information about particular features such 
as tidal tails or stripped HI. APERTIF most likely will 
deliver 2 or 3 of these cases every day (estimate based on 
the data shown in Figj^. The analysis of these will still be 
done manually and visualization will still play a major role. 
In fact, automated algorithms are built on the knowledge 
acquired during the manual approach (see section 2.6 for 
the role of interactive visualization and machine learning 
in visual analytics). Moreover, coupling visualization tools 
with semi-automated data analysis techniques is necessary 
in order to improve the inspection itself. 

The subcubes containing the sources detected by source 
finders will be relatively small with maximum sizes of 512 x 
512 X 256 ~ 0.067 x 10® voxels, reducing the local storage, 
I/O bandwidth, and computational demand for visualiza¬ 
tion (easily achievable on a modern computer). 


7 



















































input and output hardware limits the interaction with a 


3-D representation (see section 2.5). Therefore, the inter¬ 
pretation of a 3-D visual representation has never been 
investigated thoroughly. Additional complication is that 
the 3-D structure of the HI in a cube is not in a 3-D spatial 
domain. The third axis represents velocity and thus the 
3-D rendering delivers a mix of morphological, kinematical 
and geometrical information. Therefore, 3-D visual ana¬ 
lytics has never been developed for HI data. These are 
the main reasons why the development of 3-D visualiza¬ 
tion as a tool for inspecting, understanding and analysing 
radio-astronomical data has been slow. Currently avail¬ 
able hardware, e.g. GPUs, now enable interactive volume 
rendering, stimulating further development. 

3-D visualization techniques can provide many insights 
about the source under study. In Figj^ the three-dimensio¬ 
nal visualization of a particular source in the PPScl fila¬ 
ment, discussed in section]^ shows a 3-D view of its HI dis¬ 
tribution and kinematics providing an immediate overview 
of the structures in the data. Two main components are 
visible in Figj^ a central body, which is the regularly ro¬ 
tating disk of the galaxy, and a tail which is unsettled gas 
resulting from tidal interaction with another galaxy. The 
3-D structure of the HI data is, however, difficult to inter¬ 
pret for several reasons: i) the third axis of a data cube is 
frequency, which is converted into a velocity applying the 
Doppler formula to the 2I-cm HI line; ii) the measured 
velocities are the line-of-sight velocity components of a ro¬ 
tating system, therefore the 3-D shape depends directly 
on the rotation curve; iii) in addition, the kinematic infor¬ 
mation of the gas is affected by geometric properties such 
as inclination, orientation of the semi-major axis, and gas 
distribution. Due to these complexities in the data, the 
user of a 3-D inspection tool needs reasonable experience 
with the data and a certain learning period to assimilate 
the tool itself. This is not different from the situation 25 
years ago, when radio astronomers had to train themselves 
to understand 2-D visual representations such as movies 
of channel maps and position-velocity diagrams. During 


this learning process interactivity is a key-factor (see 3.3.1 


and 3.3.2). 


The 3-D visualization paradigm (volume rendering) de¬ 
scribed and used in this paper is limited by the use of 2-D 
input and output hardware such as a standard monitor 
and mouse. A simple practical example of a limitation in 



Figure 3: Another view of the source in Figj^is shown. The blue 
surface represents the full resolution data, while the green is the 
smoothed version at 60” spatial resolution. Both surfaces are repre¬ 
sentations of the signal at Sc. The green surface shows a very faint 
filamentary structure that connects the two galaxies. 


find even using 3-D. Real-time smoothing can help in deal¬ 
ing with the noisy character of the data. In fact, if the 
signal is comparable to the noise, which will be the case 
for many APERTIF observations, it is not possible to dis¬ 
tinguish the signal itself from the noise at full resolution in 
any way. In Figj^ it is shown that only in the smoothed 
(60”) version of the same data (in this case the signal-to- 
noise ratio of the filament is increased from ~ I to ~ 4) 
it is possible to localize a very faint filamentary structure 
that connects the two galaxies. It is is already possible to 
detect the filament after applying a smoothing to a spatial 
resolution of 30” (signal-to-noise of ~ 2). 

In the following use cases we will show how 3-D inter¬ 
active visualization helps in the analysis of the sources. 

3.3.1. Use Case A: analysis of sources with tidal tails 

Fig|^explores the source shown in Figs[g an m in more 
detail. A big tail due to a gravitational interaction is 
clearly present in the data cube. It is very easy to rec¬ 
ognize the tail in the volume rendering because the data 
are coherent in all three dimensions. 

In the case of HI in galaxies one can extract additional 
information from fitting the observations with the so called 


3-D is the absence of a method for picking the value of 

tilted-riny model (Warner et al.| 

one pixel with a cursor. Complementary visualization in 

(e.g., TiRiFic 

Jozsa et al. 2007|) 

1-D and 2-D can repair these deficiencies. Moreover, there 

and Fraternali 

2015 

in prep.)) j 




Barolo (Di Teodoro 


is not a single best way to visualize a radio data cube, 
but the combination of several methods (3-D, 2-D, 1-D, 
side by side, overlaid, blinking, etc.) and the interaction 
between them could deliver a very powerful analysis tool. 
It is important to view the data in different ways; this is 
the key to fully assimilating the information in the data. 
Therefore, a high-level of 1-D/2-D/3-D linked views must 
be achieved. 

Very faint coherent signals, under 3 ct, are difficult to 


model data cube, simulating the observed HI distribution 
of the galaxy as a set of concentric, but mutually inclined, 
rotating rings, which is then compared directly to the ob¬ 
servation. This operation can give a deeper knowledge of 
the kinematics and morphology: asymmetries in surface 
density and velocity, presence of extra-planar gas, pres¬ 
ence of inflows and outflows, gas at anomalous velocities, 
etc. However, these algorithms cannot recognize tidal tail 
structures and separate them from the central regularly 
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Figure 4: Another two views of the source in Figj^are shown. The blue surface is a manual selection of the tidal tail. 


rotating body of the galaxy. Combining 3-D visualization 
with these algorithms through a 3-D selection tool (e.g. 

I will be very powerful. As shown in Fig. 
components visually enables a better view 
and a better understanding compared to the visual repre¬ 
sentation shown in the middle panel in Fig[^ 

A 3-D selection tool will not only be useful for high¬ 
lighting the different components with different colors, but 
also for retrieving quantitative information (noise calcula¬ 
tion, HI mass, velocity gradient, tilted-ring model-htting, 
etc.) on the selected volume. For example, in the case of 
this PPScl source the user can separate the components 
and perform the calculations separately on the two volu¬ 
metric selections. In this process, the key-feature is the 
interactivity of the process itself. 


Yu et al. 2012 


separating the 


3.3.2. Use Case B: modeling feedback 

It has been shown that the gas distribution of some 
spiral galaxies (e.g., NGC2403 shown in Fig[^ is not com¬ 
posed of just a cold regular thin disk. Stellar winds and su¬ 
pernovae can produce extra-planar gas (e.g., galactic foun¬ 
tain (Bregman 19801). In this case, modelling can be used 
to constrain the 3-D structure and kinematics of the extra- 
planar gas which is visible in the data as a faint kinematic 
component in addition to the disk. 3-D visualization of 
both the data and the model can provide a powerful tool to 
investigate such features. The visualization tool could use 
the output model of automated model-fitting algorithms 
for visually highlighting the different components in the 
data cube. In fact, if the model of the cold thin disk is 
subtracted from the data, it is possible immediately to lo¬ 
cate any uncommon features in the data cube of interest 
and have already an idea of their properties, directing fur¬ 
ther modeling. For example, a model of the extra-planar 
gas above or below the disk with a slower rotation and 
a vertical motion provides quantitative information about 


the rotation and the infall velocity of such gas. 

In Fig[^ the data of the NGC2403 observations are 
colored in green, while the blue structure is a tilted-ring 
model of regular rotation automatically fitted to the data 
with Barolo. The top panel in Figj^ represents the 
position-velocity diagram along the semi-major axes which 
shows the typical rotation curve of a late-type galaxy plus 
some unsettled gas in the inner region. The middle panel 
is a 3-D representation of the data, but it is very difficult 
to distinguish between the cold disk and the extra-planar 
gas. In fact, too much information is condensed in that vi¬ 
sual representation. Separating and visually highlighting 
the different kinematic components, such as in the bot¬ 
tom panel, clearly shows the extra-planar gas. 3-D visu¬ 
alization gives an immediate overview of the coherence. 
For example, it highlights the presence of extra-planar gas 
and its extension. On the other hand, for checking the 
data pixel by pixel it is better to use a two-dimensional 
representation like a position-velocity diagram. 


4. Prerequisites for visualization of HI 


Goodman (20121 has already expressed that a visual¬ 


ization environment for astronomy should satisfy: 


i) 


IV 


interactivity; 

linked views with different representations of the data 
(2-D, 3-D and high-dimensional visualization); 
availability of an open source repository and a high 
level of modularity in the source code for enabling 
collaborative work; 

interoperability with Virtual Observatory (VO) tools 
through the Simple Application Messaging Protocol 


(SAMP; Taylor et al. 20111. 


These requirements are also valid in our case, the visu¬ 
alization of HI in galaxies. Moreover, the interface must 
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NGC2403 P-V diagram 



be able to handle astronomical world coordinates. This is 
of primary importance for many applications such as over¬ 
laying images taken at different wavelengths with other 
telescopes, cross-correlating source positions and velocities 
with existing catalogs, etc. A full overview of representa¬ 
tion methodologies of celestial coordinates in FITS and 
related issues is given in Calabretta and Greisen (20021 
and Greisen et al. (20061. 

From section we concluded that the data cubes of 
interest will have dimension < 10^ voxels (< 0.25 GB), 
but a large number (~ 100, 000) of small sources will be 
delivered by the surveys. Therefore, for quickly extracting 
the information from the data and presenting them in a 
clear and synthetic form, the visualization must be qual¬ 
itative, quantitative, and comparative. In the next three 
paragraphs we will describe these demands and why we 
need three levels of visualization. 


Angular Offset 



Figure 5: Three different illustrations of the Hi data of NGC2403 
from the THINGS survey ( [Walter et al.[|2008l l are shown. The galaxy 
is very well resolved. The top panel represents the position-velocity 
diagram along the semi-major axes which shows the typical rotation 
curve of a late-type galaxy (the blue contours represent the model 
that fits the regular disk) plus some unsettled gas in the inner region 
(the lowest green contour of the data is at Scr). The middle and 
bottom panels illustrate two 3-D representation of the data using an 
accumulate projection method. 


4-1. Qualitative visualization 

First of all, astronomers want to look at their data in 
various ways in order to assess the data quality. An ex¬ 
perienced astronomer can distinguish faint sources from 
the noise and instrumental artefacts, recognize the mor¬ 
phology and the kinematics of a galaxy, and identify un¬ 
expected HI emission (e.g., very faint structures such as 
extra-planar gas, tidal tails, and ram-pressure filaments). 
Therefore, qualitative visualization will continue to play a 
major role. 

In the previous section we showed the advantages and 
the drawbacks of adopting 3-D visualization. Very fast in¬ 
teractivity in rendering, in 3-D navigation, in data smooth¬ 
ing, and in quantitative and comparative functionality is 
important: if the interactivity is too slow, only the obvi¬ 
ous signal will be found and subtle features may remain 
unnoticed. More precisely, the visualization should have a 
user-friendly interface capable to sustain navigation with 
more than 15 fps in order to provide the user with a fast 
interaction such as rotation, zooming, and panning of the 
data. 

The interface should have the capability to change the 
transfer function (i.e., mapping the value of the projected 
voxels onto a color and transparency value) interactively 
to help the astronomer in the qualitative understanding of 
the data, both in the 2-D and 3-D visualization. 

The user should also be able to choose different line-of- 
sight integrations during the process of projection for the 
volume rendering (e.g., minimum, maximum, accumulate). 
For example, in order to visualize HI absorption that is a 
negative line, a minimum transfer function is needed, while 
to see the HI emission in galaxies one can use a maximum 
or a very specific accumulate transfer function. 

4-2. Quantitative visualization 

Interactive quantitative visualization which allows the 
user to extract quantitative information directly from the 
visual presentation is of primary importance. In astron¬ 
omy, and in particular in radioastronomy, this is not a new 
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concept. For example the KARMA package is a very good 
quantitative tool in the framework of 1-D and 2-D visu¬ 
alization. KARMA developers showed that a first level of 
quantification is to retrieve numbers from the visualized 
dataset and in some cases to represent them in a visual 
way for a better understanding. Examples are: 

i) display of the flux value through a pixel in slice view 
and/or plot intensity prohles and display the value; 

ii) calculation of noise, standard deviation, maximum, 
minimum, HI mass or velocity gradient, etc., in a spe¬ 
cific area or volume; 

iii) segmentation of the 3-D data volume of an object; 

iv) construction and display of moment maps and position- 
velocity diagrams. 


A second level of quantihcation can be introduced by 
having interactive features between the visualization and a 
plotting library (see, for example, the work in progress by 
Goodman (2012) and her team related to the GLUE Project 

The idea is to plot quantitative information related to 
the data and then have a visual representation of that in¬ 
formation in the visualization of the data. In order to give 
an idea of the benehts of this functionality, a hypothetical 
example follows: the first step is downloading H I, opti¬ 
cal and infrared data, creating star formation rate (SFR) 
maps and plotting the local SFR values as a function of the 
HI column density (Nhi) of the correspondent pixel. The 
plot allows the identification of pixels deviating from the 
power law relation between SFR and Nhi. Subsequently, 
it will be possible to locate possibly deviant pixels by high¬ 
lighting them in the 3-D visualization. The second step is 
to examine where they are in the 3-D data in order to 
assess whether they occupy specific regions, i.e., if they 
are coherent in the 3-D data. The third step is retrieving 
quantitatively the SFR of a specific environment of the 
data cube under investigation. For that it is necessary to 
select different zones using the visualization and then to 
plot the SFR/Nhi of each zone with a different color (for 
example two regions in a spiral galaxy: the spiral arms 
and the bulge). 

Standalone quantitative visualization is however not 
satisfactory. A synergy, using linked views, with compar¬ 
ative visualization is necessary for assessing the quality of 
the analysis, such as comparing a tilted-ring model with 
the data, and highlighting subtle faint structure in the 
data as we have shown in section 13.3.21 


4-3. Comparative visualization 

In sections 13.3.11 and 13.3.21 we showed how in the case 
of HI in galaxies one can extract additional information 
from tilted-ring model-fitting. 

The visualization tool should also enable an interactive 
comparison between data and models in order to check the 


^http://projects.iq.harvard.edu/seamlessastronomy/ 
software/glue 


quality of the model provided by the automated algorithm. 
This is possible by having the model routine embedded in 
the visualization interface. In fact, a coupling between 
model fitting and visualization will enable an interactive 
change of the parameters of the model, such as rotation 
curve, density column, and inclinations as function of the 
radius, and the comparison of the new model with the 
data. Interactive tilted-ring model fitting greatly helps in 


the analysis of warped galaxies. For example, Sparke et al. 


(20091 adopted an interactive procedure, using INSPECTOR, 


for arriving at the hnal model of NGC 3718 shown in the 
paper. INSPECTOR is an interactive tilted-ring modeling 
routine in GIPSY using a comparative visualization tool. 

The comparison between an observation and a model 
of a galaxy can be made by examining 3-D renderings of 
the data and the model in two separate windows, or by 
showing in one window an overlay of the model on the 
observation and in another window the difference between 
them. This separates regularly rotating gas from unusual 
kinematic features (extra planar gas, tidal tails, ram pres¬ 
sure induced structures). In addition, the interface needs 
to support display windows next to the 3-D rendering with 
plots in which one can view results of the source analysis 
such as the rotation curve. 

Gomparative visualization can be also extended us¬ 
ing models obtained by running A^-body simulations (see 
Barnes and Hibbard, 2009 Barnes 2011). This kind of sys¬ 


tematic studies can benefit, in terms of speed and interac¬ 
tivity, from the usage of optimized A^-body codes running 


on GPUs (Nyland et aL| 

2007 

Portegies Zwart et al. 

2007 

Gapuzzo-Dolcetta et al. 

2013 

, some of which are publicly 


available via the Astronomical Multipurpose Software En¬ 


vironment (AMUSE; Pelupessy et al. 2013). 


4.4- High-dimensional visualization techniques 

High-dimensional data visualization (e.g. TOPCAT (Tay¬ 
lor, 2005)) of the parameter tables will enable the capabil¬ 
ity to have a full picture of the characteristics of the data 
in the catalog. This feature is very important to discover 
the unexpected. In fact, the catalog paradigm can fail if 
the number of sources is too large: in general it is possible 
to retrieve a list of data from catalogs using flags such as 
names or certain parameters of the objects; it is, however, 
usually not possible to have a general view of the main 
parameters of the sources in question. Therefore, a visu¬ 
alization package should be able to download tables that 
contain the required properties of galaxies (flux, flux er¬ 
ror, degree of asymmetry, velocity width, integrated profile 
shape, etc.) and plot these parameters, allowing the user 
to find outliers. The user should also have the capabil¬ 
ity to mark the data of interest in the plot and download 
the requested data cube(s) from the catalog, using the in¬ 
terface for further exploration of the 3-D signatures and 
comparing them with one or more models. This can be 
achieved using the SAMP protocol and other VO tools. 
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4-5. Summary 

In this section we have defined the requirements that 

visualization of HI emission, in the survey era, must sat¬ 
isfy. We briefly summarize them here: 

a) astronomical world coordinates in order to combine the 
visualization of HI data with data obtained at other 
wavebands; 

b) 3-D capabilities (i.e., presence of interactive volume 
rendering for grid data of dimension < 10^ voxels and 
interactive color and opacity function widgets); 

c) interactive linking between 1-D/2-D/3-D views; 

d) quantification: physical data units, labels, and statisti¬ 
cal tools; 

e) linked 1-D/2-D/3-D selection tools; 

f) 3-D segmentation techniques; 

g) interactive smoothing; 

h) comparative visualization (multiple views, overlaid vi¬ 
sualizations, etc.); 

i) tools for generating tilted-ring and A^-body models; 

j) interoperability with VO tools. 


5. Review of state of-the-art 3-D visualization pack¬ 
ages 

In the previous section we described in detail all the 
requirements a visualization tool must satisfy for enabling 
the source analysis that we outlined in section [3)3l A re¬ 
view of the current state-of-the-art of 3-D visualization is 
very important in order to avoid duplication and develop¬ 
ment of rendering algorithms and tools which may already 
exist. We performed a review of current 3-D visualization 
software with the idea in mind that they have to satisfy 
the requirements listed in section |4.5[ plus the following 
technical prerequisites. The software must: 


i) run on multiple platforms; 

ii) have an intuitive interface; 

iii) have a Python wrapper for easy introduction of the 
SAMP protocol; 

iv) have a high level of modularity in the source code; 

v) have proper documentation and long-term maintain¬ 
ability (i.e., presence of a significant user- and developer- 
community) . 


Many rendering algorithms and tools exist but we re¬ 
stricted the detailed review to a short list of publicly avail¬ 
able, open-source and currently maintained packages with 
3-D interactive rendering capabilities: 


1) Paraview (Morelanda et ah, 20071: a general-purpose 


multi-platform data analysis and visualization applica¬ 
tion. The ParaView project started in 2000 as a col¬ 
laborative effort between Kitware Inc. and Los Alamos 
National Laboratory. 


2) SDSlicer (Fedorov et al. 2012): a software package for 


visualization and image analysis of medical data. It is 
natively designed to be available on multiple platforms. 


3) Mayavi2 (Ramachandran and Varoquaux 2010 2012): 


a general purpose, cross-platform tool for 2-D and 3-D 
scientific data visualization. 

2010): a new volume ren- 


4) ImageVisSD (Thomas Fogal 


dering program developed by the NIH/NIGMS Cen¬ 
ter for Integrative Biomedical Computing (CIBC). The 
software is multi-platform and scalable. 

For each package we performed a detailed review study 
in two steps: 


i) a software user-friendliness survey: we tested the four 
packages by inspecting and analysing the HI emis¬ 
sion of WEIN069 and NGC2403 (shown in Fig. 
and[^. We performed a survey by asking 15 radioas¬ 
tronomers to evaluate the intuitiveness and interac¬ 
tivity of the different features offered by each pack¬ 
age using WEIN069 as test data set. The evaluation 
involved each participant filling out a questionnaire 
after one hour of utilization of the packages. In all 
cases the latest stable version of the software was used 
with the following hardware set-up: a Linux laptop 
(Ubuntu 14.04 LTS) equipped with an Intel i7 2.60 
GHz CPU, an NVIDIA GeForce GTX860M GPU, 16 
GB of DDR3 1.6GHz RAM, and a 15.6 inch monitor 
with a resolution of 1920 x 1080. 

ii) a source code evaluation: we performed a detailed 
study of the full source code, the level of modularity, 
and the available documentation for developers. 


5.1. Review results 

The resulting ranking of the packages is shown in Table 
[2 In addition we provide a detailed list of pro’s and con’s 
for each package in Table 

We can divide the packages in two classes: i) Paraview 
and SDSlicer; ii) Mayavi2 and ImageVisSD. The soft¬ 
ware in the first group has many features, while the sec¬ 
ond group mainly offers qualitative visualization. The 
users noted that the interfaces offered by Paraview and 
SDSlicer are complex, but at the same time, most of the 
users found Paraview rather intuitive. The intuitiveness 
(i.e., the learning time) ranking shown in Table [^obviously 
also depends on the experience of the users with similar 
visualization software. 

The review highlighted that the users experienced a 
major lack of functionality in all four packages for: dis¬ 
playing labels with proper astronomical coordinates; 1-D 
visualization (e.g. line profiles); interactive smoothing; 
simple editing or blanking, and specific operations such 
as constructing a position-velocity diagram along a speci¬ 
fied spatial axis; and comparative visualization (e.g., over¬ 
laid 1-D profiles and overlaid 2-D contour plots on another 
image). This is not a surprising result. In fact, the pack¬ 
ages considered in this section are aimed towards general 
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Paraview 

SDSlicer 

Mayavi2 

ImageVisSD 

Description 

a 

( 8 ) 

( 8 ) 

( 8 ) 

( 8 ) 

astronomical world coordinates 

b 

IZ 








■ 




3-D capabilities and color transfer function 

7% 1 27% 1 67% 

0% 1 7% 1 93% 


80% 1 20% 1 0% 

27% 1 27% 1 47% 

c 

■ 1 

1 


i: 




( 8 ) 

linked 1-D/2-D/3-D views 

20% 1 73% 1 7% 

7% 1 20% 1 73% 


60% 1 40% 1 0% 

d 








( 8 ) 

data probe, labels and statistics 

47% 1 53% 1 0% 

33% 1 67% 1 0% 

100% 1 0% 1 0% 

e 

■ i 



1^ 





(g) 

linked 1-D/2-D/3-D selection tools 

27% 1 60% 1 13% 

7% 1 40% 1 53% 


67% 1 33% 1 0% 

f 

n 




in 








3-D segmentation 

7% 1 20% 1 73% 

13% 1 20% 1 67% 

47% 1 27% 1 27% 

7% 1 47% 1 47% 


g 

( 8 ) 

( 8 ) 

( 8 ) 

( 8 ) 

interactive smoothing 

h 

■ C 

1 




( 8 ) 

(g) 

comparative views 

27% 1 53% 1 20% 

33% 1 47% 1 20% 

i 

( 8 ) 

( 8 ) 

( 8 ) 

( 8 ) 

tilted-ring/fV-body models routines 

j 

(g) 

(g) 

(8) 

(g) 

SAMP and VO connectivity 

i 

■ 

■ 

■ 

■ 

multi platforms software 

ii 








c 



intuitiveness of the interface 

13% 1 47% 1 40% 

67% 1 33% 1 0% 

60% 1 20% 1 20% 

13% 1 20% 1 67% 

iii 

■ 

■ 

■ 

■ 

Python wrapper 

iv 

■ 

■ 

■ 

■ 

modularity of the software 

V 

■ 

■ 

■ 

■ 

documentation/long-term maintainability 


Legend: 


0 = missing 


M = not satisfactory 


M = satisfactory 


good 


Table 1: A ranking of several 3-D visualization packages is shown. In the top part of the table, the letters in the first column refer to the 
summarized requirements in section [4^ In the bottom part, the roman numerals refers to the technical prerequisites listed in this section]^ 
The colored bars are a representation of the ranking based on a user-test survey performed with 15 radioastronomers. Note that this software 
ranking is oriented towards the visualization of Hi data (grid volume of dimension < 10^ voxels) in a desktop environment. 
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Software 

Pro’s 

Con’s 

Paraview 

• CPU/GPU rendering based on the Visualization 
Toolkit (VTK); 

• skill to connect to a server to do the computation; 

• editable interface with unlimited 2-D/3-D views; 

• linked 1-D/2-D/3-D views; 

• cropping and selection tools; 

• 3-D segmentation techniques, i.e., isosurfaces; 

• skill to perform statistics on the user selection; 

• high level of modularity in the source code; 

• embedded python console in the interface for fast 
interaction with the source code; 

• presence of documentation both for users and de¬ 
velopers. 

• the interface is complex; 

• astronomical world coordinates and 
labels not displayable; 

• the interface is not optimized for 1-D 
and 2-D visualization; 

• interactive smoothing missing. 

SDSlicer 

• CPU/GPU rendering based on VTK; 

• interface is also optimized for 2-D visualization of 
channel maps; 

• high-level of linking between 2-D and 3-D views; 

• interactive cropping and selection editor tools; 

• skill to perform statistics on the user selection; 

• 3-D segmentation techniques, i.e., isosurfaces; 

• high level of modularity in the source code; 

• embedded python console in the interface for fast 
interaction with the source code; 

• presence of documentation both for users and de¬ 
velopers. 

• the interface is very complex and not 
intuitive; 

• astronomical world coordinates and 
labels not displayable; 

• 1-D visualization missing; 

• 2-D contour plots missing; 

• interactive smoothing missing. 

Mayavi2 

• GPU rendering based on TVTK (Python wrapper 
for VTK); 

• cropping and selection tools; 

• 3-D segmentation techniques, i.e., isosurfaces; 

• contour plots; 

• a simple and clean scripting interface in Python, 
easy integration with other python libraries. 

• the interface is not stable; 

• presence of only GPU rendering ca¬ 
pabilities. The frame rate per second 
is low, fps < 5, for data cubes bigger 
than 10® voxels; 

• color transfer function widget is not 
easy to use; 

• astronomical world coordinates and 
labels not displayable; 

• 1-D visualization missing; 

• interactive smoothing missing; 

• lack of statistics tools. 

ImageVisSD 

• very light, fast, and intuitive interface; 

• GPU rendering; 

• 3-D segmentation techniques, i.e., isosurfaces. 

• the long-term maintainability of the 
rendering code is unknown; 

• astronomical world coordinates and 
labels not displayable; 

• 1-D and 2-D visualization missing; 

• interactive smoothing missing; 

• lack of statistics tools; 

• lack of documentation. 


Table 2: A list of pro’s and con’s relative to the four packages is presented. The advantages and disadvantages listed form a detailed description 
of the feedback provided by the authors and the users of the software survey shown in Tab^ 
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or medical visualization purposes and lack the specialized 
visualization representations and interaction aspects com¬ 
mon in radio astronomy. On the other hand, they do have 
advanced rendering capabilities, such as provided by the 
Visualization Toolkit (VTK), and a modern, multiple- 
platform, reliable interface based on Qt For example, 
the packages enable the user to save the whole working 
session in a bundle: the data, the visualization, and the 
module structure used for the analysis. 

At at the moment, all the packages listed lack multi¬ 
volume rendering. Multi-volume rendering is the opera¬ 
tion to render two or more volumes on the same space. 
This feature is necessary for enabling very fast 3-D over¬ 
laid comparative visualization. 

5.2. Visualization of HI and SDSlicer 

Despite the complexity of the interface, we chose to 
adopt SDSlicer as base platform for the development of 
a HI visualization tool. Our choice has been the result 
of considering various factors such as the presence of ade¬ 
quate documentation, the number of people actively work¬ 
ing on the software, and quantitative features already im¬ 
plemented in the interface. These three main factors make 
SDSlicer the best solution for us. In fact, the medical vi¬ 
sualization needs are indeed very close to the astronomical 
ones. For example, the interface layout and the navigation 
through the data are already optimized for parallel 2-D vi¬ 
sualizations (e.g., movies of channel maps). The following 
features need to be added to SDSlicer in order to fulfill 
the requirements described in section]^ 

i) proper visualization of astronomical data cubes using 
the data formats FITS, HDF5, CASA, and Miriad; 

ii) enabling interactive smoothing in all three dimensions 
and multi-scale analysis, such as wavelet lifting; 

iii) generation of flux density profiles, moment maps and 
position-velocity diagrams linked with the 3-D view; 

iv) interactive 3-D selection of HI sources; 

v) interactive HI data modeling coupled to visualization; 

vi) introduction of the SAMP protocol to enable interop¬ 
erability with Topcat, and other VO tools and cata¬ 
logs. 


6. Concluding Remarks 


HI observations are moving into the era of big sur¬ 
veys. Upcoming HI surveys, such as those envisaged with 
APERTIF and ASKAP, will deliver big data sets lead¬ 
ing the radio astronomer into the regime of the so-called 
Fourth Paradigm (i.e., data-intensive scientific discovery, 
Hey et al. ( 2009[ )). 


APERTIF is expected to start its observing campaign 
of the northern sky in 2017. The daily APERTIF data 


4 


http://www.vtk.org/ 
http://qt-proj ect.org/ 


cube will have dimensions of 2048 x 2048 x 16384 ~ 68.7 x 
10® voxels and the expected number of HI source detec¬ 
tions is ^ 100 every day. WALLABY will have similar 
characteristics. The large volume of data creates new 
needs, in terms of tools and algorithms which must ex¬ 
ploit new ideas and solutions for storage, data reduction, 
visualization, and analysis to obtain scientific results. 

Visual analytics, |2.6[ the combination of automated 
data processing with human reasoning, creativity and in¬ 
tuition, supported by interactive visualization, is one of 
the prime methodologies that allow putting the human 
in the investigation loop. In this paper, we defined the 
visualization prerequisites and future perspective for ap¬ 
plying this paradigm to HI observations focusing on the 
introduction of 3-D visualization in the process of source 
finding and analysis. In fact, the current astronomy vi¬ 
sualization software has very limited 3-D capabilities for 
grid data (section [2 ); while general purpose visualization 
software (section ^ is not aimed at the analysis of HI 
data. 

In this paper we showed: 


i) 


more than 99% of the voxels in the HI datasets that 
APERTIF will deliver is dominated by noise and the 
sources are hidden in it (see Figj^. The current source 
finder software can extract them with high reliabil¬ 
ity and completeness ([Whiting] [2012 Serra et al. 


20151. The typical volume of individual sources will 
be 50® = 1.25 x 10® voxels (up to 512® ~ 1.3 x 10® 
in the case of occasional large galaxies), reducing the 
storage, I/O bandwidth and computational demands 
for visualization to a level accessible on desktops and 
laptops. The predicted weekly data rate, on the other 
hand, is high (~ 10® sources). Fortunately, only a 
subset of these (2-3 sources per day) will be highly 
resolved (more than 10 resolution elements) or show 
complex features such as tails and extra-planar-gas. A 
powerful interactive visualization tool will be needed 
for fast inspection and analysis of these objects, 
the analysis of the sources, for example producing mo¬ 
ment maps and rotation curves, will also be done in 
an automated way. In particularly complex cases, hu¬ 
man interaction will be necessary to drive the auto¬ 
mated algorithm in the data volume and provide im¬ 
mediate feedback on the quality of the results (see 
section 3.3.2). Visualization tools with supervised 


semi-automated analysis algorithms will be needed. 
In fact, it becomes necessary to produce refined data 
with minimal time but maintaining the same level of 
quality. For example, the derivation of the rotation 
curve of a galaxy passes through the creation of the 
so-called tilted-ring model which, then, is compared 
to the data. This process has been converted to an 
automatic algorithm. However, significant kinematic 
features different from the Keplerian rotation (e.g., 
tidal tails, see Figj^ will be present in part of the 
data. The current algorithms can not automatically 
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flag these features for the analysis. Therefore, human 
intervention is necessary to separate the regularly ro¬ 
tation disk and different kinematic features, and to 
feed the fitting algorithm with the selection, so that 
the user can judge the results quantitatively. 

we showed that 3-D visualization can 


iii) in section 3.3 


enable an immediate overview of the kinematics of a 
galaxy, leading to improved understanding of the co¬ 
herence in the data. Moreover, a high level of interac¬ 
tivity in all visualization aspects (rendering, smooth¬ 
ing, retrieving quantitative information, and compar¬ 
ative features) will be the key for enabling a fast in¬ 
spection of the data. On the other hand, volume ren¬ 
dering has its limitations due to current 2-D input and 
output hardware. Some examples of these limitations 
are projection issues and the impossibility to move the 
cursor pixel by pixel. Adding 1-D/2-D views linked 
to the 3-D representation resolves these limitations. 
The combination with high-dimensional visualization 
techniques, which can help in finding outliers and pat¬ 
terns in the oceans of data, is also necessary, 
in section we identified the requirements for the vi¬ 
sualization and analysis of HI in galaxies: interactive 
visualization with quantitative and comparative capa¬ 
bilities with 3-D selection techniques and supervised 
semi-automated analysis. Moreover, the source code 
must have the following characteristics for enabling 
collaborative work: open, modular, well documented, 
and well maintained. After a study of the state of- 
the-art of the open-source and actively maintained 
visualization packages with rendering of grid data ca¬ 
pabilities (see section]^, we adopted SDSlicer as a 
platform for developing a fully interactive desktop HI 
data visualization tool with quantitative and compar¬ 


ative features (section 5.21. These techniques can also 


be used for other astronomical datasets such as 3-D 
datasets provided by recent Integral Field Unit (IFU) 


observations (|Sancliez et al. 

2012 Karman et al., 2014| 

Richard et al.| 

2015 

). In that case, collaborative work 


is necessary to identify the key features needed to pro¬ 
vide quantitative visualization. 


In conclusion, the success of a visualization tool de¬ 
pends heavily on the number of people using it over its 
life time. The life time of a software package depends 
on several factors such as usability, maintainability, and 
whether it has been developed with good insight in the 
subtle aspects of the data and its interpretation. KARMA 
is a perfect example of a successful package, developed in 
the mid 90’s but still widely used by radio astronomers to 
date. Our aim is to achieve an analogous result exploiting 
the current hardware and algorithmic paradigms, focusing 
on the linking between 2-D and 3-D visualization, quanti¬ 
tative/comparative features and high-dimensional visual¬ 
ization. 
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^https: //www. youtube. coin/watch?v=sS_5LrOS5bo 
^https: //www. youtube. coin/watch?v=yLjW9nbd08g 
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